Approaches for Learning Constraint Dependency Grammar from Corpora

نویسنده

  • M. P. Harper
چکیده

This paper evaluates two methods of learning constraint dependency grammars from corpora: one uses the sentences directly and the other uses subgrammar expanded sentences. Learning curves and test set parsing results show that grammars generated directly from sentences have a low degree of parse ambiguity but at a cost of a slow learning rate and less grammar generality. Augmenting these sentences with subgrammars dramatically improves the grammar learning rate and generality with very little increase in parse ambiguity.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Syntactically annotated corpora of Estonian

Syntactically annotated corpora are needed 1) to train and test parsers and various language technological products grammar checkers, information retrievers and extractors, machine translators etc; 2) to check the agreement of existing linguistic theories with the real language usage. The corpora can be annotated on different levels of depth. In shallow syntactically annotated corpora a syntact...

متن کامل

PAC Learning Constraint Dependency Grammar Constraints

Constraint Dependency Grammar (CDG) 11, 13] is a constraint-based grammatical formalism that has proven eeective for processing English 5] and improving the accuracy of spoken language understanding systems 4]. However, prospective users of CDG face a steep learning curve when trying to master this powerful formalism. Therefore, a recent trend in CDG research has been to try to ease the burden ...

متن کامل

Learning Probabilistic Dependency Grammars from Labeled Text

We present the results of experimenting with schemes for learning probabilistic dependency grammars1 for English from corpora labelled with part-of-speech information. We intend our system to produce widecoverage grammars which have some resemblance to the standard 2 context-free grammars of English which grammarians and linguists commonly exhibit as exampies.

متن کامل

Learning Language from a Large (Unannotated) Corpus

A novel approach to the fully automated, unsupervised extraction of dependency grammars and associated syntax-to-semantic-relationship mappings from large text corpora is described. The suggested approach builds on the authors’ prior work with the Link Grammar, RelEx and OpenCog systems, as well as on a number of prior papers and approaches from the statistical language learning literature. If ...

متن کامل

Extracting a Tree Adjoining Grammar from the Penn Arabic Treebank

Much progress in natural language processing (NLP) over the last decade has come from the combination of using corpora of annotated naturally occurring text along with machine learning algorithms. Following this trend, corpora have been created for other languages, such as the Penn Arabic Treebank (PATB) (Maamouri et al.2003). However, the corpora almost invariably need to reinterpreted for the...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003